home *** CD-ROM | disk | FTP | other *** search
-
- 1. Minutes of the OpStats Working Group : March 1991 IETF
-
-
- Chairpersons: Bernard Stockman of NORDUnet and Phill Gross of CNRI
- Notetaker: Dan Friedman of BBN Communications.
-
-
-
- The OSWG (Operational Statistics Working Group) met for three sessions.
- The following report summarizes the proceedings. It is organized along the
- lines of "Accomplishments", "Issues" and "Process" rather than as a
- sequential narrative. At the request of the chairpersons, the minutes contain
- proposals to resolve some of the open issues: basically, a (concrete) cut at
- what we should do next.
-
- 2. Summary of Accomplishments
-
- Our main accomplishments were to agree upon objectives for the work and
- to take some steps towards realizing those objectives. The objectives are
-
- - To define an architecture for providing Internet access to operational
- statistics for any Regional or the NSFnet.
-
- - To classify the types of information that should be available
-
- - To develop (or foster the development of) public domain software
- providing this information. The aim here is to specify a baseline
- capability that all the Regionals can support with minimal development
- effort and minimal ongoing effort. (It is hoped that if they can do it with
- minimal effort, they in fact will.)
-
- Our progress in each of these areas is described next.
-
- 2.1. Architecture
-
- We selected a client/server architecture for providing Internet access to
- operational statistics, as shown in the figure.
-
- This architecture envisions that each NOC will have a server who provides
- locally collected information in a variety of forms (along the "raw <-->
- processed" continuum) for clients. High level proposals for the client/server
- interaction and functionality for the "first release" of the software are
- discussed later in the minutes.
-
- 2.2. Classification of Opstats Information
-
- We identified three classes of reports based upon prospective audiences. They
- are:
-
- - Monthly Reports (a.k.a. "Political Reports") aimed at Management.
- - Weekly Reports aimed at Engineering (i.e. planning)
- - Daily Reports aimed at Operations
-
- 2.3. Development Plan
-
- We decided that it was most important and easiest to address the
- management reports first, and therefore, we spent the most time focussing on
- them. We arrived at several key areas:
-
- - Offered Load (i.e. traffic at external interfaces)
- - Offered Load segmented by "Customer"
- - Offered Load segmented protocol/application
- - Resource Utilization (Link/Router)
- - Availability
-
- The first report came to be known as the "McDonald's Report" (N Billion
- Bytes/Packets Served).
-
- 3. Technical Issues
-
- 3.1. Client/Server Interaction
-
- The following was proposed for Client/Server Commands. (The initial
- proposal was put forth by Dan Long of NEARnet.)
-
- Commands:
- - Login (with authentication)
- - Help -- Returns a description of the available data (names, a pointer
- to a map, gateways, interfaces, and variables)
- - Format -- Defines retrieval format
- - Select/Retrieve -- Pose a query to server. (This generates a response
- containing the data.)
- - Exit.
-
- Proposed Query Language:
-
- "SQL-like": SELECT <router interface> AND <variable> FROM
- <startdate> TO <enddate> AT <granularity> WITH <conditions-met>
-
- The authentication issue was considered important as some of the traffic
- information, i.e. who's talking how much to whom, will be sensitive.
- We also felt that the "name/map" issue is important for the following
- reasons: It will be impossible to agree on a naming structure that is
- universally meaningful. Even if we could agree on such a convention, it will
- always be most convenient for the local network operators to maintain
- information using names that are meaningful to them. Therefore, the server
- should be permitted to deliver results using the internal names but must able
- to provide file(s) that enable a person to figure out what the names mean.
-
- Notetaker's Proposal:
- Maintain the following information in one or more files. Pointers to
- information are obtained by the Help command.
-
- Router names:
- Gives the name of the router as used in the statistics data.
- Gives a (human-supplied) description of the router's location, e.g.
- University XYZ, MegaBig International Corporate Headquarters, or some
- other information that enables an outsider to determine what role the
- router is playing in the network. This information embodies the
- knowledge contained in the network operators' heads.
-
- Net Names:
- Provides the (internal) names of the networks attached to
- the routers' external interfaces. (Router names can be internal here since
- the information in a) provides a mapping). Gives associated IP
- addresses.
-
- ASCII file containing backbone point-to-point links (using router names
- to specify endpoints). If the link also has an internal name that will be
- use when providing link information, give this name. Also gives
- linespeed. Need to think of a way to specify a connection to a public data
- service. All data provided by the server is given using internal names.
-
- 3.2. Contents of Monthly Reports
-
- We had three presentations on the Monthly Reports. (The groups were
- commended for their pioneering use of the 11PM-2AM time slot.) Members
- of the groups were
-
- - Kannan Varadh? (Photocopy blurred here), Eric Carroll, Bill Norton,
- Vikas Aggarwal
-
- - Sue Hares, Et. Al. (Sorry, that's all I have on the hardcopy.)
-
- - Charles Carvalho, Ross Veach, David O'Leary
-
- The following is a synthesis of the presentations and attendant discussions:
-
- 3.2.1. The McDonald's report
-
- The main issues here were: whether to provide packets or bytes or both and
- whether to provide input or output or both.
-
- Notetaker's Opinion:
- I was convinced by the argument that, unless
- something is radically wrong with the network, differences between input
- and output should be "down in the noise", and the explanations for the
- differences will be too obscure for a management report. (If the network is
- really throwing away a large amount of traffic, we'll hear about it well
- before a management report has to be written.) So I vote for input only in the
- McDonald's Report. More on bytes vs. packets later.
-
- 3.2.2. Offered Load by Customer
-
- There was agreement that this is useful. The main controversy was how
- customers should be identified in a publicly available report.
-
- Notetaker's Proposal:
- We present the cumulative distribution or density
- function of offered load vs. number of interfaces. That is: Sort the offered
- load (in decreasing order) by interface. Plot the function F(n),
- where F(n) is percentage of total traffic offered to the top n interfaces or
- the function f(n) where f is the percentage of traffic offered by the n'th
- ranked interface. (An example appears toward the end of the minutes.)
-
- I feel that the cumulative is useful as an overview of how the traffic is
- distributed among users since it enable you to quickly pick off what
- fraction of of the traffic comes from what number of "users." (It will be
- technically and politically difficult to resolve "user" below the level of
- "interface.") This graph will suggest more detailed explorations to people
- who have access to customer "names."
-
- 3.2.3. Offered Load by Protocol Type and Application
-
- People seemed to agree that this is valuable and that pie charts are a good way
- to present the information (since there is no "natural" ordering for the
- elements of the X-axis, a.k.a "Category Axis" in spreadsheet lingo.) "By
- protocol" means TCP, UDP etc. "By application" means Telnet, FTP, SMTP
- etc. It was also pointed out that it is potentially useful to do this both by
- packets and by bytes since the two profiles could be very different (e.g. FTP
- typically uses large packets, Telnet small packets etc.)
-
- 3.2.4. Resource Utilization
-
- Everyone agreed that the objectives of this report should be to provide some
- indication of whether the network has congestion and if/where it needs more
- capacity. There was considerable debate on exactly how often one would have
- to poll utilization to determine whether there is congestion and also on
- exactly what summary statistics to present: averages, peaks, peak of peaks,
- peak of averages, averages of peaks, peaks of averages of peaks.....
- We seemed to focus more on link utilization than on router utilization,
- probably for two reasons. It is more difficult to standardize measures of
- router utilization, and link costs dominate router costs.
- We kept looking for some underlying "physics" of networks to determine the
- collection interval. Here's one opinion.
-
- Notetaker's Opinion:
- It will be impractical to determine congestion solely
- from link utilization, since one would have to collect at a very small interval
- (certainly less than one minute). Therefore, we should use estimate
- congestion by looking at dropped packet statistics.
-
- We should use link utilization to capture information on network loading.
- The polling interval must be small enough to be significant with respect to
- variations in human activity since this is the activity that drives loading in
- network variation. On the other hand, there is no need to make it smaller
- than an interval over which excessive delay would noticeabley impact
- productivity. For example, people won't notice congestion if it only occurs
- for 10 seconds a day.
-
- 30 minutes is a good estimate for the time at which people remain in one
- activity and over which prolonged high delay will affect their productivity.
- To track 30 minute variations, we need to sample twice as frequently, i.e.
- every 15 minutes.
-
- 3.2.5. Availability
-
- We didn't have much time to get to this. There was discussion of presenting
- the information "By Customer" (e.g. Customers with Top N Total Outage
- Times) or just reporting on # outages that last longer than a certain amount
- of time.
-
- Notetaker's Proposal:
- We should omit Availability reports from the first deployment for several
- reasons. First, we didn't spend enough time to obtain consensus. Second, they
- can be politically sensitive. Third, outage data can be very tough to process.
- Think of trying to determine exactly how a network partition affects
- connectivity between different pairs of end users. It's an "N-Squared"
- problem. If we do want to address this, we should start with site, router, and
- external interface outages only, since these are O(N) problems.
-
- 4. Development Proposal
-
- The following is a proposal for a "development/deployment" plan that tries
- to reach a reasonable compromise among functionality, burden on network
- operations resources, and "time to market." The discussion is segmented into
- three parts:
-
- - What information is to be available through the server
- - What are the collection/storage requirements
- - What presentation tools should we build
-
- 4.1. Information Base
-
- The goal of the Server piece is to provide access to data in a fairly raw
- form (to be described next) and should be the first thing we do. Presentation
- tools that use this as input can be developed in parallel if people want to
- but we shouldn't put them on the critical path.
- We will have to provide the collection tools as well (unless every NOC is
- already collecting enough data to supply the information outlined below.)
- The capabilities of the "first release" are to support the
-
- - McDonald's Report
- - Offered Load by Interface Report
- - Offered Load by Application Report
- - Link Utilization Report
- - Congestion Report
-
- The Availability Report is missing because it is hard to do and (based upon
- the level of discussion we had) seemed to be of lower priority.
- In the first release, we provide a server and client that can deliver the
- following statistics. For N specified days over a rolling three month interval:
-
- - Total Input Packets and Input Octets per day per external interface.
- - Total Input Packets and Octets across the network per day per
- application. (Note that this is NOT per interface.)
- - Mean, Standard Deviation, and Peak 15 minute utilization per day per
- (unidirectional link)
- - Peak discard percentages over fifteen minute intervals per link-direction
- per day.
-
- The Exchange Format between Server and Client should be ASCII-based
- because this enables people to quickly look at the data to see if it makes
- sense and because it enables quick, custom data reduction via AWK. (I have
- found both these capabilities to be useful in my own analyses of network data.)
- The first Client that we write should simply retrieve the data in the exchange
- format and write it to disk. Rationale for this Base:
-
- This information supports the reports described below and then some, so that
- presentation tools development will not be limited to these reports.
- The three month collection interval is short enough to keep storage
- requirements under 5 Mbytes but long enough so that one can examine
- longer term trends by "dumping" the data a few times a year. (These files
- should be highly compressible, easily 2:1, since they'll contain mainly ASCII
- numerals, repetitions of the names of entities, and whitespace, colons etc.)
- The ASCII-based format will enable us to develop interoperable tools more
- quickly. TBD:
-
- - The exact exchange format (no real opinion here other than that it be
- ASCII-based).
- - The command structure. The proposed format seems to be an excellent
- starting point.
-
- 4.2. Collection/Storage Requirements:
-
- Input bytes and packets per external interface must be collected frequently
- enough to prevent counter overflow. As they are collected, they can be added
- to running totals for the day. At the end of the day, the daily totals for each
- external interface are stored.
-
- Input bytes and packets per application over all interfaces frequently enough
- to prevent overflow. At the end of the day these can be aggregated into daily
- totals. (I guess you have collect these per external interface but they can be
- aggregated into a network-wide total as the day goes on.)
-
- Per link interface per 15 minutes: bytes sent, packets sent, packets received.
- (To get the drop rate, you have to correlate sent and received at the two ends
- of the link.) At the end of the day, store away the average utilization, the
- standard deviation, the peak utilization, and the peak drop percentage.
- Assuming 10 octets per item for storage, I estimate that the necessary 3 month
- history can be maintained with <5 Mbytes for a network with 100 routers, 500
- external interfaces, and 200 links.
-
- 4.3. Reports/Presentation Tools:
-
- My hunch is that standardization of presentation tools will come about based
- on who does the work first. (It's hard to argue with decent code that's in
- place: to wit, the entire TCP/IP phenomenon.) Here are some suggestions
- (and the reasoning) for what we should do first.
-
- 4.3.1. McDonald's Report:
-
- For an N day period, graph Total Input Bytes per day. Put the average packet
- length as a "note" on the graph.
-
- Reason:
- Bytes is a better measure of the "useful" load carried by the network,
- i.e. the information sent around by the applications; packets are really an
- artifice of the way we do things. As a network manager, I would be interested
- in the end-user volume of information. By putting the average packet length,
- one can convert to packet volumes if need by.
-
- For the same reason, I suggest that the next two reports be done in bytes as
- well. Note that the suggested initial information base will support
- comparable presentations by packets as well.
-
- 4.3.2. Offered Load by Customer Report:
-
- Based on total input bytes for an N day period: Graph the distribution (or
- density function) of total input bytes vs. external interfaces as shown below.
- The external interfaces should be put in decreasing order of offered load (in
- bytes).
-
- 4.3.3. Offered Load by Application Report
-
- Based upon total input bytes for the N day period, present a pie chart of the
- distribution by application.
-
- 4.3.4. Link Utilization
-
- The objective here is to provide some information on the utilization of the
- total set of links and on the "worst" link.
- The input "data" we have to work with comprises two matrices:
-
- A(i,j) = average utilization of link i on day j
- P(i,j) = peak (15 minute) utilization of link i on day j.
-
- Define TAVG(A(i)) = time average of A(i,j) (i.e. sum-over-j(A(i,j))/#days).
- Define TAVG(P(i)) = time average of P(i,j) (i.e. sum-over-j(P(i,j))/#days).
-
- I suggest that we order links by the TAVG(P(i)) measure, i.e. the "worst" link
- is the one that has the highest average peak utilization over the period.
- Graph the following:
-
- A histogram of the collection of A(i,j) values, using 10% buckets on the
- X-axis, i.e. plot the function F(n) where F(n) = percentage of A(i,j) entries
- in the (n-1)*10% -- n*10% range.
-
- A comparable histogram of the P(i,j).
-
- Histograms are useful for summarizing the data over all links over the entire
- period and can suggest further explorations.
- For the "worst link" (as defined above), plot as a function of day, its average
- utilization for the day and its peak utilization for the day. (Note that the
- data that we collect supports exploration of these time series for any link.)
-
- Note that the proposed initial information base will support such analyses for
- any subset of the links.
-
- 4.3.5. Congestion
-
- The available data as specified in section is
-
- D(i,j) = peak drop rate (during any fifteen minute interval) for link i on
- day j.
-
- Plot a histogram of D(i,j). For the "worst" link (as defined above),
- say link I,
-
- plot D(I,j) as a function of j.
-
- 5. Presentations
-
- In addition to the groups on the monthly reports, we had presentations from
- Bill Norton of Merit and Chris Meyers of Wash. U.
- Chris proposed an exchange format. I'm guessing that the document is
- available on-line if you wish to review it.
- Bill discussed Merit's OpStats activities for NSFnet. He focussed on their
- presentation tools as well as the way that they internally organize the data
- (a tree structure of Unix files). One important point made during this
- discussion is that relational databases are not good for storing OpStats.
- (Performance is the issue.) This is unfortunate since many commercial DBMSs
- are relational in nature, and therefore, we cannot leverage their (usually
- substantial) report facilities. The idea of a "client/server" model grew out
- of Bill's presentation.
-
- 6. Notable and Quotable
-
- We had some discussion of how Network Managers use Management
- Reports and, therefore, what the reports need to present. One significant
- observation was that "Political Graphs don't have to make sense."
- During Sue Hare's presentation of her group's work on the monthly reports,
- the KISS acronym was re-interpreted as Keep It Simple Sue.
-
- Participants (who signed the list):
-
- Vikas Aggarwal <vikas@jvnc.net>
- Bill Barns <barns@gateway.mitre.org>
- Eric Carroll <eric@utcs.utoronto.edu>
- Charles Carvalho <charles@salt.acc.com>
- Bob Collet /PN=Robert.D.Collet/O=US.SPRINT/ADMD=TELEMAIL/C=US/@SPRINT.C
- >OM
- Dale Finkelson <dmf@westie.unl.edu>
- Dan Friedman <danfriedman@bbn.com>
- Demi Getschko <demi@fpsp.fapesp.br>
- Dave Geurs <dgeurs@mot.com>
- Fred Gray <fred@homer.msfc.nasa.gov>
- Phill Gross <pgross@nri.reston.va.us>
- Olafur Gudmundsson <ogud@cs.umd.edu>
- Steven Hunter <hunter@es.net>
- Dale S Johnson <dsj@merit.edu>
- Dan Jordt <danj@nwnet.net>
- Tracy LaQuey Parker <tracy@utexas.edu>
- Nik Langrind <nik@shiva.com>
- Walt Lazear <lazear@gateway.mitre.org>
- Dave O'Leary <oleary@sura.net>
- Dan Long <long@nic.near.net>
- Garry Malkin <gmalkin@ftp.com>
- Lynn Monsanto <monsanto@sun.com>
- Don Morris <morris@ucar.edu>
- Bill Norton <wbn@merit.edu>
- Rehmi Post <rehmi@ftp.com>
- Joel Replogle <replogle@ncsa.uiuc.edu>
- Robert J. Reschly Jr. <reschly@brl.mil>
- Ron Roberts <roberts@jessica.stanford.edu>
- Manoel A Rodriques <manoel.rodrigues@att.com>
- Jim Sheridan <jsheridan@ibm.com>
- Brad Solomon <bsolomon@hobbes.msfc.nasa.gov>
- Osmund deSouza <desouza@osdpc.ho.atcom> ???
- Mike Spengler <mks@msc.edu>
- Bob Stewart <rlstewart@eng.xyplex.com>
- Roxanne Streeter <streeter@nsipo.nasa.gov>
- Kannan Varadhan <kannan@oal.net>
- Ross Veach <???????>
- Sue Wang <swang@ibm.com>
- Carol Ward <cward@spot.colorado.edu>
- Cathy Withbrodt <cjw@nersc.gov> ?????
- Wing Wong <ww14706@malta.sbi.com>
-
-